library(tidyverse)
library(tigris)
library(censusapi)
library(sf)
library(mapview)
library(plotly)

options(
  tigris_class = "sf",
  tigris_use_cache = T # This stores tigris loads somewhere on your machine for much faster personal loading.
)

This script uses an output from load_safegraph.Rmd which processes down the daily social distancing dataset from Safegraph to just block groups in the Bay Area.

Loading geometries

These were the high-level lists of Bay county names, county shapes, and block group IDs from the other script, in case they become useful again.

bay_county_names <-
  c(
    "Alameda",
    "Contra Costa",
    "Marin",
    "Napa",
    "San Francisco",
    "San Mateo",
    "Santa Clara",
    "Solano",
    "Sonoma"
  )

bay_blockgroups <-
  bay_county_names %>% 
  map(function(x){
    block_groups("CA",x,progress_bar=F) %>% 
      pull(GEOID)
  }) %>% unlist()

bay_counties <-
  counties("CA", cb = F, progress_bar=F) %>% 
  filter(NAME %in% bay_county_names)

Also, as part of starting to focus on one of our partners, SJ, here’s a quick way to get all the block groups in SJ. If you want to practice this analysis with another location, or see a specific opportunity to support another partner, then this is where you’d start modifications.

scc_blockgroups <-
  block_groups("CA","Santa Clara", cb=F, progress_bar=F)

sj_boundary <-
  places("CA", cb=F, progress_bar=F) %>% 
  filter(NAME == "San Jose")

sj_blockgroups <-
  scc_blockgroups %>% 
  dplyr::select(GEOID) %>%
  st_join(sj_boundary %>% dplyr::select(geometry), left = F) %>%
  st_set_geometry(NULL) %>%
  left_join(scc_blockgroups %>% dplyr::select(GEOID)) %>% 
  st_as_sf()

mapview(sj_blockgroups)+mapview(sj_boundary,alpha.region= 0, color = "red", lwd = 4)

Note that in this case, unlike the R Basics example, I don’t convert to centroid before the st_join(), because there are random holes within SJ official geopolitical boundaries (whole different story) that certainly shouldn’t be removed from our analysis. So the st_join() holds onto anything that even touches the SJ boundary. But note there’s a huge rural block group to the east of SJ that comes along for the ride that’s practically the size of SJ, which you may decide to manually remove for the sake of better visualiation for now.

sj_blockgroups <-
  sj_blockgroups %>% 
  filter(!GEOID %in% c("060855135001"))

mapview(sj_blockgroups)

Loading social distancing data

Loading Bay social distancing data will still take some time, but much faster than loading the whole US. We’ll quickly create a SJ version too. When you open this file, you’ll probably see that the first few steps have been commented out to reduce knitting time, but you should practice uncommenting and running all to understand how it works.

Now’s a good time to also remind you that per your data sharing agreement, raw Safegraph data like this should definitely stay within the F Drive Restricted Data Library. You’ll see at the end the level of aggregation that is then OK to save outside of the F Drive. The gray area in-between will come down to teaching team judgment.

As you start to save RDS files, be very careful about overwriting existing files. We’re going to be backing things up a lot, but just be aware and definitely notify us if you think something has gone wrong.

# bay_socialdistancing <- 
#   readRDS("F:/Restricted Data Library/Safegraph/covid19analysis/bay_socialdistancing.rds")
# 
# sj_socialdistancing <- 
#   bay_socialdistancing %>% 
#   filter(origin_census_block_group %in% sj_blockgroups$GEOID)
# 
# saveRDS(sj_socialdistancing, file = "F:/Restricted Data Library/Safegraph/covid19analysis/sj_socialdistancing.rds")
sj_socialdistancing <- readRDS("F:/Restricted Data Library/Safegraph/covid19analysis/sj_socialdistancing.rds")

We will eventually join the Safegraph data to our geographies, but right now since the data includes many rows for each block group (individual days), this is not a good time to do it.

Analyzing social distance data

We invite you to explore this dataset in detail, referring to the online explainer to understand the fields. The following is a simple exercise to get the number of devices staying “completely home” each day, divide it by the total devices, aggregate that information up to the SJ level, and then plot as a time series. We’ll also plot the average for the last three days as a map.

# helps to quickly check if fields have missing data before you do math.
sum(is.na(sj_socialdistancing$completely_home_device_count))
## [1] 0
sum(is.na(sj_socialdistancing$device_count))
## [1] 0
sj_percenthome_bg <-
  sj_socialdistancing %>%
  mutate(
    `% Completely at Home` = completely_home_device_count/device_count,
    date = date_range_start %>%  substr(1,10) %>% as.Date()
  ) %>% 
  dplyr::select(origin_census_block_group,date,completely_home_device_count,device_count,`% Completely at Home`)

sj_percenthome_time <-
  sj_percenthome_bg %>% 
  group_by(date) %>% 
  summarize(
    completely_home_device_count = sum(completely_home_device_count),
    device_count = sum(device_count)
  ) %>% 
  mutate(`% Completely at Home` = (completely_home_device_count/device_count*100) %>% round(1))

chart <-
  sj_percenthome_time %>% 
  ggplot(
    aes(
      x = date,
      y = `% Completely at Home`
    )
  ) + 
  geom_line() +
  labs(
    x = "Date",
    y = "Percent staying completely at home",
    title = "San Jose, CA"
  )

chart %>% ggplotly()

We can add the non-pharmaceutical interventions (NPIs) in the graph. Here’s info about SCC’s shelter in place order, 3/16/20.

chart <-
  chart +
  geom_vline(
    aes(
      xintercept = "2020-03-16" %>% as.Date() %>% as.numeric() # ggplotly needs vlines to be numeric
    ), 
    linetype="dotted"
  ) +
  annotate("text",label= "Santa Clara County\nShelter-in-Place Order", color = "black", x=(("2020-03-16" %>% as.Date())-6), y=55, size=2)

chart %>% ggplotly()
sj_percenthome_map <-
  sj_percenthome_bg %>% 
  filter(date > max(date)-3) %>% 
  group_by(origin_census_block_group) %>%
  summarize(`% Completely at Home` = round(mean(`% Completely at Home`)*100,1)) %>% 
  left_join(sj_blockgroups, by = c("origin_census_block_group" = "GEOID")) %>% 
  st_as_sf()

mapview(sj_percenthome_map, zcol = "% Completely at Home", layer.name = "% Completely<br>at Home,<br>Average 4/1-4/3")

What other kinds of actionable insights, based on your observations of needs on the ground in SJ and ultimately engagement of our stakeholders, can you create out of this dataset?